A scalable intelligent non-content-based spam-filtering framework

نویسندگان

  • Yong Hu
  • Ce Guo
  • Eric W. T. Ngai
  • Mei Liu
  • Shifeng Chen
چکیده

Designing a spam-filtering system that can run efficiently on heavily burdened servers is particularly important to the widely used email service providers (ESPs) (e.g., Hotmail, Yahoo, and Gmail) who have to deal with millions of emails everyday. Two primary challenges these companies face in spam filtering are efficiency and scalability. This study is undertaken to develop an efficient and scalable spam-filtering framework for heavily burdened email servers. We propose an Intelligent Hybrid Spam-Filtering Framework (IHSFF) to detect spam by analyzing only email headers. This framework is especially suitable for giant email servers because of its efficiency and scalability. The proposed filtering system may be deployed alone or in conjunction with other filters. We extract five features from the email header, namely ‘‘originator field”, ‘‘destination field”, ‘‘X-Mailer field”, ‘‘sender server IP address” and ‘‘mail subject”. Email subjects are digitalized using an algorithm based on n-grams for better performance. Moreover, using real-world data from a well-known ESP in China, we employ various machine-learning algorithms to test the model. Experimental results show that the framework using the Random Forest algorithm achieves good accuracy, recall, precision, and F-measure. With the addition of MetaCost framework, the model works stably well and incurs small costs in various cost-sensitive scenarios. 2010 Elsevier Ltd. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Scalable Spam Filtering Architecture

The proposed spam filtering architecture for MTA servers is a component based architecture that allows distributed processing and centralized knowledge. This architecture allows heterogeneous systems to coexist and benefit from a centralized knowledge source and filtering rules. MTA servers in the infrastructure contribute to a common knowledge, allowing for a more rational resource usage. The ...

متن کامل

Hybrid spam filtering for mobile communication

Spam messages are an increasing threat to mobile communication. Several mitigation techniques have been proposed, including white and black listing, challenge-response and content-based filtering. However, none are perfect and it makes sense to use a combination rather than just one. We propose an anti-spam framework based on the hybrid of contentbased filtering and challenge-response. A messag...

متن کامل

A Dynamic Reputation Service for Spotting Spammers

This paper presents the design, implementation, evaluation, and initial deployment of SpamSpotter, the first open, large-scale, real-time reputation system for filtering spam. Existing blacklists (e.g., SpamHaus) have trouble keeping pace with spammers’ increasing ability to send spam from “fresh” IP addresses, and filters based purely on content are easily evadable. In contrast, SpamSpotter dy...

متن کامل

Spam Classification Based on E-Mail Path Analysis

Email spam is the most effective form of online advertising. Unlike telephone marketing, email spamming does not require huge human or financial resources investment. Most existing spam filtering techniques concentrate on the emails’ content. However, most spammers obfuscate their emails’ content to circumvent content-based spam filters. An integrated solution for restricting spam emails is nee...

متن کامل

Arabic Spam Filtering using Bayesian Model

Many of us are concerned about an onslaught of SPAM email. Spam has become major problem for the email communications. The number of spam mails is increasing daily – studies show that over 45-50% of all current email communication is spam, it is an ever-increasing problem and will reach up to 70% in coming years. The volume of nonEnglish language spam is increasing day by day. The motivation fo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Expert Syst. Appl.

دوره 37  شماره 

صفحات  -

تاریخ انتشار 2010